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Abstract 

Adaptive codes have been introduced in [5] as a new class of non-standard variable- 
length codes. These codes associate variable-length codewords to symbols being 
encoded depending on the previous symbols in the input data string. A new data 
compression algorithm, called EAH, has been introduced in [7], where we have be- 
haviorally shown that for a large class of input data strings, this algorithm substan- 
^ ' tially outperforms the well-known Lempel-Ziv universal data compression algorithm. 

In this paper, we translate the EAH encoder into automata theory. 
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1 Introduction 



New algorithms for data compression, based on adaptive codes of order one [5] 
and Huffman codes, have been introduced in [6], where we have behaviorally 
5^ ! shown that for a large class of input data strings, these algorithms substantially 

outperform the well-known Lempel-Ziv universal data compression algorithm 
[12]. 

EAH (Encoder based on Adaptive codes and Huffman codes) has been intro- 
duced in [7], as an improved version of the algorithms presented in [6]. The 
work carried out so far [6,7] has behaviorally proved that EAH is a highly 
promising data compression algorithm, as one can remark again in the exam- 
ples presented in the following sections. In this paper, we translate the EAH 
algorithm into automata theory. 
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Before ending this introductory section, let us recall some basic notions and 
notations used throughout the paper. We denote by \S\ the cardinality of the 
set S; if x is a string of finite length, then |x| denotes the length of x. The 
empty string is denoted by A. 

For an alphabet A, we denote by A* the set [j^=o A", and by A+ the set 
UJ^iA", where A° denotes the set {A}. Also, we denote by A-" the set 
Ur=o A\ and by A^" the set \JZn 

Let X be a finite and nonempty subset of A"*", and w G A"*". A decomposition 
of w over X is any sequence of words ui, U2, . . . ,Uh with Ui & X , 1 < i < h, 
such that w = U1U2 ■ ■ ■ Uh- A code over A is any nonempty set C C A"*" such 
that each word w G A"*" has at most one decomposition over C. A prefix code 
over A is any code C over A such that no word in C is proper prefix of another 
word in C. If m, v are two strings, then we denote by m ■ f , or simply by uv the 
catenation between u and v. 



2 A Short Review of Adaptive Codes 

Adaptive codes have been recently introduced in [5] as a new class of non- 
standard variable-length codes. The aim of this section is to briefly review 
some basic definitions, results, and notations directly related to this class of 
codes. 

Definition 1 Let S and A he alphabets. A function c : S x S-" A"*", 
n>l, is called adaptive code of order n if its unique homomorphic extension 
c : S* — * A* defined by: 

• c(A) = A 

• c((Tia2 . . . (T„) = c((Ti, A) C(a2, (Tl) ... C((T„_1, (Ti(T2 . . . (T„_2) 
C(cr„, 0-1(72 . . . Or„_i) c(0-„+i, 0-1^2 . . . c(0-„+2, (^2(^3 ■ ■ ■ C^n+l) 
c(o-n+3) 0"3C'4 • • • 0"n+2) • • • c{am, (Tm~nC"m-n+l • • • O^m-l) 

for all o'ia2 ■ ■ ■ Um G is injective. 

As specified by the definition above, an adaptive code of order n associates 
variable-length codewords to symbols being encoded depending on the previ- 
ous n symbols in the input data string. Let us take an example in order to 
better understand this adaptive mechanism. 

Example 2 This example makes use of an adaptive code of order two. Con- 
sider S = {a,b,c} and A = {0,1} two alphabets, and c : E x E-^ A"*" 
a function constructed by the following table. One can easily verify that c is 
injective, and according to Definition 1, c is an adaptive code of order two. 
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Table 1. An adaptive code of order two. 





a 


b 


c 




ab 


ac 


ba 


bb 


be 


ca 


cb 


cc 


A 


a 


01 


10 


10 


00 


11 


10 


01 


10 


11 


11 


11 


00 


00 


b 


10 


00 


11 


11 


01 


00 


00 


11 


01 


10 


00 


10 


11 


c 


11 


01 


01 


10 


00 


11 


11 


00 


00 


00 


10 


11 
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Let X = abacca G S+ be an input data string. Using the definition above, we 
encode x by c{x) = c(a,A)c(b,a)c(a,ab)c(c,ba)c(c,ac)c(a,cc) = 001011111100. 

Let c : S X be an adaptive code of order n, n > 1. We denote 

by Cc,aia2...ah the set {c(cr, aia2 . . .at) | cr e S}, for all aia2 . . .at G - 
{A}, and by Cc,\ the set {c((T, A) | a G S}. We write Co-io-2...o-h instead of 
Cc,(7io-2...o-h5 and Cx instead of Cc,a whenever there is no confusion. Let us 
denote by AC(S, A,n) the set {c : S x A"*" | c is an adaptive code of 

order n }. The proof of the following theorem can be found in [5]. 

Theorem 3 Let S and A be two alphabets, and c : S x A^ a function, 

n > 1. If Cu is prefix code, for all u G then c G y4C(S, A, n). 



3 Translating the EAH Encoder into Automata Theory 



In this section, we focus on translating the EAH data compression algorithm 
[7] into automata theory. Before translating the algorithm, some new defini- 
tions are needed. For further details on formal languages, the reader is reffered 
to [3]. 

Definition 4 Let S be an alphabet such that-k ^ S, andw = U1U2 . . .Uh E S"*", 
Ui G Il,^i. The adaptive automaton of order n associated to w is the nondeter- 
ministic finite automaton Aniw) = {S{w,n),T{w,n),6w,n,so{w,n),F{w,n)), 
where S{w,n),T{w,n),6w,n, so{w,n), and F{w,n) are defined by: 



S{w, n) = {uj \ n + l<j<h}U {ujUj+i . . . Uj+n-i \ ^^j^h — n}U 

{{a, code) \ ^j,l<j<h — n, such that a = \{k \ I < k < 

code G {0,1}"*" is the codeword 



T{w, n) 



h 



n, UkUk+l ■ ■ ■ Uk+n 



associated to when the previous n symbols are Uj . 

{{a, X)\^j,n<j<h such that a = | . . .u^. 

- 6w,n '■ S{w,n) X T{w,n) ^V{S{w,n)) is given by: 

{t^t} if s = ^ 

(a, c)) = \ Zl{s, a, c) if s ^ * and ^^(s, a, c) ^ 
{-k\ if s 7^ * and Z'!^{s, a,c) =^ 



. u 



'j+n—l 



}u 



u 



'j—n+l ' 



Uj}\}] 
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- So{w,n) = U1U2 ■ ■ - Un] 

- F{W, n) = {Uh^n+lUh-n+2 ■ ■ ■ Uh}] 

and Z^{s,a,c) is given by: 



Zl{s,a,c) = { 



{p I 3j : UjUj+i = sp, p is 

encoded by c, a = \{k \ UkUk+i = sp}\} if n = 1, |s| = 1. 
{p I 3j : Uj ... Uj+n = sp, \p\ = 1, p is 

encoded hj c, a = \{k \ Uk . . . Uk+n = sp}\} if n > 2, \s\ > 2. 

{p I 3j : Uj^n+l ...Uj=p, Uj = s, c = A, 

a= \{k \ Uk-n+i . . .Uk = p, Uk = s}\} if n > 2, |s| = 1. 



Example 5 Let E = {a,b,c,d} be an alphabet, and w = abdbacdba G S"*". 
It is easy to verify that the adaptive automaton of order one associated to w 
is constructed as below (the algorithm which associates the codewords is not 
important in this example). 



(2,0) 



(1,1), (2,0) 



(2,0) 



(1,0) 



(1,0), (1,1), (2,0) 



(1,0) 



(1,0) 



(1,0), (1,1) 



Fig. 1. (abdbacdba) 



Let U = {ui, U2, . . . , Uk) be a /c-tuple. We denote by (ui, U2, . . . , Uk).i the i-th. 
component of [/, that is, Ui = {ui,U2, . . . ,Uk).i, for all z, 1 < z < k. The 
0-tuple is denoted by (). The length of the tuple U is denoted by Len{U). 
\{ V = {vi,V2,...,Vb), M = (mi,m2,...,m,,,f/), N = (rii, n2, • • • , n„ 1/), 
P = {pi, . . . ,Pi_i,Pi,Pi+i, . . . ,pt) are tuples and q is an element or a tuple, 
then we define P < q, P > i, U A V, and M<)N by: 

• P <1 g = (pi, ...,puq) 

m P [>i= (pi, . . . ...,pt) 

• U AV = M2, . . . , Uk, Vi, V2,..., Vt) 

• M()N = (mi + ni, m2 + 1, . . . , + 1, n2 + 1, . . . , + 1, [/ A 1/) 
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where rrii, nj are integers, 1 < i < r and 1 < j < s. If (/i, . . . , fk) is a tuple of 
integers, let us denote by Huffman(/i, . . . , f^) the Huffman algorithm, which 
returns a tuple ((ci, /i), . . . , (c^, /fc)), where Cj is the codeword associated to 
the symbol with the frequency and is the length of Cj, 1 < i < /c. 



Algorithm 1 



Huffman(tuple JF = (/i, /2, . . . , /fc), where /c > 1) 

1. £:= ((/i,0,(l)),(/2,0,(2)),...,(/,,0,(A;))); 

2 . Let V = (A, A, . . . , A) be a fc-tuple; 

3. if A; = 1 then V.l := 0; 

4. while Len(C) > 1 do 
begin 

4.1. Let i < j he such that 1 < i, j < Len{C) and are 
the smallest elements of the set {Cq.l | 1 < g < Len{C)}; 

4.2. First := {C.i.Len{C.i).r | 1 < r < Len{C.i.Len{Ci))}] 

4.3. Second := {Cj.Len{Cj).r | 1 < r < Len{Cj.Len{Cj))}] 

4.4. for each x G First do V.x := ■ V.x; 

4.5. for each x G Second do V.x := 1 ■ V.x; 

4 . 6 . W := C := C > j; C := C > i; jC := jC < U; 
end 

5. return ((V.l, |V.1|), . . . , (V.fc, | V.A:|)); 



Let S = {(To,cri, . . . ,(Tm-i} be an alphabet, 1 < m < 256. Let us recall the 
idea of our algorithm, denoted by EAHra. For example, let w = W1W2 ■ ■ ■ Wh 
be a string over E. We encode w by a 5-tuple U = {A, B,C, D, E), where 
A, B, C, D, E are bitstrings constructed as below. 



EAlin{w).l = A. Let Index : S ^ {0, 1, . . . , m — 1} be a function which 
gets as input a symbol a G S and outputs an index i such that a = ai. 
U h > n, then A = BaselOBase2{Index{wi)) . . . BaselOBase2{Index{wn)), 
that is, A is the conversion of the sequence Index{wi), . . . , Index{wn) from 
base 10 to base 2, and \ A\ = n* [log2 m] . Otherwise, ii h < n, then we consider 
A = BaselOBase2{Index{wi)) . . . BaselOBase2{Index{wh)) , that is, A is the 
conversion of Index{wi), . . . , Index{wh) from base 10 to base 2, and \A\ = 
h * [log2 m] . 

EAHn(ty).2 = B. B = BqBi . . . -Bm^-i, where Bj is defined by: 



B, 



1 if 3 i G {0, . . . , m — 1} and 3 /c G {1, . . . ,h} such that 

Index"^ {10 jn{j) .1)) . . . Index~^ {10 jn{j) .n) ■ = Wk ■ ■ ■ w^+n 
otherwise 
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for all j, < j < — 1, and 10_m(j) is a tuple of length n denoting the 
conversion of j from base 10 to base m, such that Len(10_m(j)) = n and 
10_m(j)i is the i-th digit (from left to right) of this conversion. 



EAHn(w).3 = C. C = CqCi . . . Cm-x. where d - ^^^^ ^'"""^ 
i < m — 1, and for all j, < j < — 1, G {0, 1}* is defined by: 



CI 



1 if i?j = 1 and 3 A; G {1, 2, . . . , /;,} such that 

Index~^{\^sn{j)X)) . . . I ndex~^ {10 jn{j).n) ■ ai = Wk ■ ■ - Wk+n- 
if B-j = 1 and $ k e {1,2, . . . , h} such that 

I ndex~^ {10 jn{j).l)) . . . I ndex~^ {10 jn{j).n) ■ = Wk ■ ■ - Wk+n- 
A if B, = 0. 



Oni n™"-i 



EAHnH.4 = D = Dq^i • • • ^m-i, where A = D1DI...D. 

<i <m - 1, and G {0, 1}* is defined by: 



J MBaselOBase2{Freq{ai,j)) if C/ = 1 

Ia ifc/^i 



where Freq{ai,j) = \{k\wk... Wk+n = (^io_m{j).i ■ ■ ■ crw_m{j).nCri}\. 

Let us denote by Marked the set {{i,j) \ C- = 1}. The greatest element of 
the set {\BaselOBase2{Freq{ai, \ {i,j) G Marked} is denoted by Max. 
Then, MBaselOBase2{Freq{ai, j)) is defined by: 

MBaselOBase2{Freq{ai,j)) = 00_^^BaselOBase2{Freq{ai,j)) 

t{i,j) 



where t{i,j) = Max — \BaselOBase2{Freq{ai, 

EAHn(u;).5 = E. E denotes the compression of w using A, B,C, D, the 
adaptive automaton of order n associated to w, and the Huffman algorithm. 

Algorithm 2 



EAH?7.(string w = W1W2 . . .Wh E S+, such that Wi E T,, 1 < i < h, h > n) 

1 . A:= \;B := \;C = \;D := \;E := A; := Ar,{w).2 := 0; 

2. Ar,{w)A := wi . . .Wn;An{w).5 := {wh^n+i ■ ..Wh};Mi := {); M2 := (); 

3. for j = n + 1 to h do An{w)A := An{w)A U {wj}; 

4. for j = 1 to h — n do An{w)A := A„,{w)A U {wjWj+i . . .Wj+n-i}] 

5 . for j = n + 1 to h do 



6 



6 . if 3 i such that Mi.i.l = Wj-n ■ ■ ■ wj-i and Mi.i.2 = wj 

7. then Mii.3 := Mi.i3 + 1; 

8 . else Ml := Mi < (wj_„ . . . Wj-i, uij, 0, A); 

9. X := \{i I Mii.3 = 0}|; 

10. while X > do 
begin 

10.1. pos := min{i \ Mi.z.3 = 0}; U := (Mi.pos); V := {pos); 

10 . 2 . for j =pos + l to Len{Mi) do 

10.3. if Mi.j.l = Mi.pos.l then 
begin 

10.3.1. U:=U<Mi.j; 

10.3.2. V:=V<j; 
end 

10.4. for J = 1 to Len{U) do W.j.4 :=Huffman((W.1.3, . . M.Len{U).3)).j.l; 

10.5. for J = 1 to Len{V) do Mi.(V.j)-3 := W.j.3; 

10.6. X := |{i I Mi.z.3 = 0}|; 
end 

1 1 . for j = n to h do 

12. if 3 z such that M2.i.l = Wj and M2.i.2 = Wj^n+i ■ ■ - Wj 

13 . then M2.Z.3 := M2.Z.3 + 1; 

14. else M2 := M2 < {wj, Wj-n+i . . .Wj,0,X); 

15. for J = 1 to Len(Mi) do A„(w).2 := A„(w).2 U {(Mi.j.3, Mi.j.4)}; 

16 . for j = 1 to Len{M2) do An{w).2 := A„(m;).2 U {(M2.J.3, M2.j.4)}; 

17. for each t G A„(m;).2 do A„(w).3(7t, t) := 

18. for each G - {.^}) x y4„(w).2 do y4„(w).3(s, t) := 0; 

19. for each {s,t) E {An{w).l - {7^}) x An{w).2 do 

20. for j = 1 to Len{Mi A M2) do 

21 . if (Ml A M2).j.l = s and t = ((Mi A M2).j.3, (Mi A M2).j.4) 

22. then A„(w;).3(s,t) := Ar,{w).3{s,t) U {{Mi A M2).j.2}; 

23. for each (s, t) E {An{w).l - x An{w).2 do 

24. if A„(w;).3(s,t) = 

25. then A„(w).3(s,t) = {7^}; 

26. A := BaselOBase2{Index{wi)) . . . BaselOBase2{Index{wn))', 

27 . for j = to m" — 1 do 

28. if 3 t e Aniw).2 such that 5(crio_m(j).i • • • (Tio_m{j).n, t) - {*} 7^ 

29. then5:=E-l; 

30. else 5:= 5-0; 

31 . for i = to m — 1 do 

32 . for j = to m" - 1 do 

33 . if fij = 1 then 

34. if 3 t such that 5(crio_m(j).i . . . crio_m{j).ny t) = then 
begin 

34.1. C:=C-l] 

34.2. D := D ■ MBaselQBase2{t.l)- 
end 
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35. elseC:=C-0; 

36 . for i = n + 1 to h do 
begin 

36 . 1 . Let t G An{w).2 be such that Wi G . . . t); 

36.2. E := E-t.2; 
end 

37. return {A, B,C, D, E); 

Let M be a string. In the remainder of this paper, we denote by LZ(u) the 
encoding of u using the Lempel-Ziv data compression algorithm [12], and by 
H(m) the encoding of u using the Huffman algorithm. Also, let us consider the 
following notations: 

• LH(m) = |H(m)|; 

• LLZ(m) = |LZ(m)|; 

• LEAHn(M) = ELi \EAB.n{u).i\. 

Example 6 Let S = {a,b,c,d,e} be an alphabet, and w a string of length 200 
over T,, given as below (between brackets). 

w= [abedcababedccabedcedcababedcedcccabedcabedcedccababedcabedc 
cccedccedccedcababedcabedcedccedcababedcabedccabedcababedcedcccc 
cedcabedcabedccccedcccabedcccedccabedccccabedccababedcabedcedcca 
bedcababedcedl 



Applying EAHl to the input string w, we get the following adaptive automaton 
of order one. 

(31,0) 



(23,1) 



(8,0) 
(14,11) 



(22, 10) 



(37,0) 

(23, 1),(37, 0),(14,11) 



(36,0) 
(23,1), (37,0), (14, 11) 



; (31, 0), (8,0), (28, 0) 
' (36,0), (28, 0), (22, 10) 



(36,0), (28,0), (22, 10) 
(31,0), (36,0), (8,0) 



Fig. 2. Ai{w) 



One can easily verify that we obtain: 

LRiw) = 462 LEAHlfw) 



310 



LLZ(w) 



388 



which shows that in this case, EAHl substantially outperforms the well-known 
Lempel-Ziv universal data compression algorithm [12]. 
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4 Concluding Remarks 



The EAH data compression algoritlim lias been proposed in [7], where we have 
behaviorally shown that for a large class of input data strings, this algorithm 
substantially outperforms the Lempel-Ziv data compression algorithm. In this 
paper, we translated the EAH algorithm into automata theory. Further work 
on adaptive codes will be focused on finding new improvements for the EAH 
algorithm, as well as other algorithms for data compression based on adaptive 
codes. 



References 



[I] J. Berstel, D. Perrin, Theory of Codes (Academic Press, 1985). 

[2] V.S. Pless, W.C. Huffman (Eds.), Handbook of Coding Theory (Elsevier, 1998). 

[3] G. Rozenberg, A. Salomaa (Eds.), Handbook of Formal Languages (Springer- 
Verlag, 1997). 

[4] D. Salomon, Data Compression. The Complete Reference (Springer- Verlag, 
1998). 

[5] D. Trinca, Adaptive Codes: A New Class of Non-standard Variable-length Codes 
(to appear in Romanian Journal of Information Science and Technology). 

[6] D. Trinca, Towards New Algorithms for Data Compression using Adaptive 
Codes, Proceedings of the 5th International Conference on Information 
Technology: Coding and Computing (Las Vegas, Nevada, USA, 2004) 767-772. 

[7] D. Trinca, EAH: A New Encoder based on Adaptive Codes. Outperforming the 
LZ77 Encoder manuscript, 2004. 

[8] D. Trinca, Special Cases of Encodings by Ceneralized Adaptive Codes 
manuscript, 2004. 

[9] D. Trinca, Meta-EAH: An Adaptive Encoder based on Adaptive Codes. Moving 
between Adaptive Mechanisms, Proceedings of the 3rd International Symposium 
on Information and Communication Technologies (Las Vegas, Nevada, USA, 
2004) to appear. ACM Digital Library. 

[10] F.L. Tiplea, E. Makinen, C. Enea, SE-Systems, Timing Mechanisms and Time- 
Varying Codes, International Journal of Computer Mathematics 79(10) (2002) 
1083-1091. 

[II] F.L. Tiplea, E. Makinen, D. Trinca, C. Enea, Characterization Results for Time- 
Varying Codes, Fundamenta Informaticae 53(2) (2002) 185-198. 

[12] J. Ziv, A. Lempel, Compression of individual sequences via variable-rate coding, 
IEEE Transactions on Information Theory vol. IT-24 (1978) 530-536. 



9 



